Mixture Density
   HOME

TheInfoList



OR:

In
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speakin ...
and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. The underlying random variables may be random real numbers, or they may be
random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...
s (each having the same dimension), in which case the mixture distribution is a
multivariate distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
. In cases where each of the underlying random variables is
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...
, the outcome variable will also be continuous and its
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
is sometimes referred to as a mixture density. The cumulative distribution function (and the
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
if it exists) can be expressed as a
convex combination In convex geometry and vector algebra, a convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other w ...
(i.e. a weighted sum, with non-negative weights that sum to 1) of other distribution functions and density functions. The individual distributions that are combined to form the mixture distribution are called the mixture components, and the probabilities (or weights) associated with each component are called the mixture weights. The number of components in a mixture distribution is often restricted to being finite, although in some cases the components may be countably infinite in number. More general cases (i.e. an
uncountable In mathematics, an uncountable set (or uncountably infinite set) is an infinite set that contains too many elements to be countable. The uncountability of a set is closely related to its cardinal number: a set is uncountable if its cardinal num ...
set of component distributions), as well as the countable case, are treated under the title of compound distributions. A distinction needs to be made between a random variable whose distribution function or density is the sum of a set of components (i.e. a mixture distribution) and a random variable whose value is the sum of the values of two or more underlying random variables, in which case the distribution is given by the
convolution In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...
operator. As an example, the sum of two jointly normally distributed random variables, each with different means, will still have a normal distribution. On the other hand, a mixture density created as a mixture of two normal distributions with different means will have two peaks provided that the two means are far enough apart, showing that this distribution is radically different from a normal distribution. Mixture distributions arise in many contexts in the literature and arise naturally where a
statistical population In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypoth ...
contains two or more
subpopulation In statistics, a population is a Set (mathematics), set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way g ...
s. They are also sometimes used as a means of representing non-normal distributions. Data analysis concerning statistical models involving mixture distributions is discussed under the title of
mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observatio ...
s, while the present article concentrates on simple probabilistic and statistical properties of mixture distributions and how these relate to properties of the underlying distributions.


Finite and countable mixtures

Given a finite set of probability density functions ''p''1(''x''), ..., ''pn''(''x''), or corresponding cumulative distribution functions ''P''1(''x''), ..., ''Pn''(''x'') and weights ''w''1, ..., ''wn'' such that and the mixture distribution can be represented by writing either the density, ''f'', or the distribution function, ''F'', as a sum (which in both cases is a convex combination): : F(x) = \sum_^n \, w_i \, P_i(x), : f(x) = \sum_^n \, w_i \, p_i(x) . This type of mixture, being a finite sum, is called a finite mixture, and in applications, an unqualified reference to a "mixture density" usually means a finite mixture. The case of a countably infinite set of components is covered formally by allowing n = \infty\! .


Uncountable mixtures

Where the set of component distributions is
uncountable In mathematics, an uncountable set (or uncountably infinite set) is an infinite set that contains too many elements to be countable. The uncountability of a set is closely related to its cardinal number: a set is uncountable if its cardinal num ...
, the result is often called a
compound probability distribution In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to som ...
. The construction of such distributions has a formal similarity to that of mixture distributions, with either infinite summations or integrals replacing the finite summations used for finite mixtures. Consider a probability density function ''p''(''x'';''a'') for a variable ''x'', parameterized by ''a''. That is, for each value of ''a'' in some set ''A'', ''p''(''x'';''a'') is a probability density function with respect to ''x''. Given a probability density function ''w'' (meaning that ''w'' is nonnegative and integrates to 1), the function : f(x) = \int_A \, w(a) \, p(x;a) \, da is again a probability density function for ''x''. A similar integral can be written for the cumulative distribution function. Note that the formulae here reduce to the case of a finite or infinite mixture if the density ''w'' is allowed to be a
generalized function In mathematics, generalized functions are objects extending the notion of functions. There is more than one recognized theory, for example the theory of distributions. Generalized functions are especially useful in making discontinuous functions ...
representing the "derivative" of the cumulative distribution function of a
discrete distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
.


Mixtures within a parametric family

The mixture components are often not arbitrary probability distributions, but instead are members of a
parametric family In mathematics and its applications, a parametric family or a parameterized family is a family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters. Common examples are parametrized (fam ...
(such as normal distributions), with different values for a parameter or parameters. In such cases, assuming that it exists, the density can be written in the form of a sum as: : f(x; a_1, \ldots , a_n) = \sum_^n \, w_i \, p(x;a_i) for one parameter, or : f(x; a_1, \ldots , a_n, b_1, \ldots , b_n) = \sum_^n \, w_i \, p(x;a_i,b_i) for two parameters, and so forth.


Properties


Convexity

A general linear combination of probability density functions is not necessarily a probability density, since it may be negative or it may integrate to something other than 1. However, a
convex combination In convex geometry and vector algebra, a convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other w ...
of probability density functions preserves both of these properties (non-negativity and integrating to 1), and thus mixture densities are themselves probability density functions.


Moments

Let ''X''1, ..., ''X''''n'' denote random variables from the ''n'' component distributions, and let ''X'' denote a random variable from the mixture distribution. Then, for any function ''H''(·) for which \operatorname (X_i)/math> exists, and assuming that the component densities ''pi''(''x'') exist, : \begin \operatorname (X)& = \int_^\infty H(x) \sum_^n w_i p_i(x) \, dx \\ & = \sum_^n w_i \int_^\infty p_i(x) H(x) \, dx = \sum_^n w_i \operatorname (X_i) \end The ''j''th moment about zero (i.e. choosing ) is simply a weighted average of the ''j''th moments of the components. Moments about the mean involve a binomial expansion: : \begin \operatorname X - \mu)^j& = \sum_^n w_i \operatorname X_i - \mu_i + \mu_i - \mu)^j\\ & = \sum_^n w_i \sum_^j \left( \begin j \\ k \end \right) (\mu_i - \mu)^ \operatorname X_i - \mu_i)^k \end where ''μi'' denotes the mean of the ''i''th component. In the case of a mixture of one-dimensional distributions with weights ''wi'', means ''μi'' and variances ''σi''2, the total mean and variance will be: : \operatorname = \mu = \sum_^n w_i \mu_i , : \begin \operatorname X - \mu)^2& = \sigma^2 \\ & = \operatorname ^2- \mu^ & (\mathrm\ \mathrm\ \mathrm)\\ & = \left(\sum_^n w_i(\operatorname _i^2\right) - \mu^ \\ & = \sum_^n w_i(\sigma_i^2 + \mu_i^ )- \mu^ & (\mathrm\ \sigma_i^2 = \operatorname _i^2- \mu_i^, \mathrm\, \operatorname _i^2= \sigma_i^2 + \mu_i^.) \end These relations highlight the potential of mixture distributions to display non-trivial higher-order moments such as
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
and
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurt ...
(
fat tail A fat-tailed distribution is a probability distribution that exhibits a large skewness or kurtosis, relative to that of either a normal distribution or an exponential distribution. In common usage, the terms fat-tailed and heavy-tailed are somet ...
s) and multi-modality, even in the absence of such features within the components themselves. Marron and Wand (1992) give an illustrative account of the flexibility of this framework., http://projecteuclid.org/euclid.aos/1176348653


Modes

The question of
multimodality Multimodality is the application of multiple literacies within one medium. For example, understanding a televised weather forecast (medium) involves understanding spoken language, written language, weather specific language (such as temperature s ...
is simple for some cases, such as mixtures of exponential distributions: all such mixtures are
unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal p ...
. However, for the case of mixtures of
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
s, it is a complex one. Conditions for the number of modes in a multivariate normal mixture are explored by Ray & Lindsay extending earlier work on univariateRobertson CA, Fryer JG (1969) Some descriptive properties of normal mixtures. Skand Aktuarietidskr 137–146 and multivariate distributions. Here the problem of evaluation of the modes of an ''n'' component mixture in a ''D'' dimensional space is reduced to identification of critical points (local minima, maxima and saddle points) on a manifold referred to as the ridgeline surface, which is the image of the ridgeline function : x^(\alpha) = \left \sum_^ \alpha_i \Sigma_i^ \right \times \left \sum_^ \alpha_i \Sigma_i^ \mu_i \right where \alpha belongs to the (n-1)-dimensional standard simplex: \mathcal_n = \ and \Sigma_i \in R^,\, \mu_i \in R^D correspond to the covariance and mean of the ''i''th component. Ray & Lindsay consider the case in which n-1 < D showing a one-to-one correspondence of modes of the mixture and those on the ridge elevation function h(\alpha)=q(x^*(\alpha) thus one may identify the modes by solving \frac = 0 with respect to \alpha and determining the value x^*(\alpha). Using graphical tools, the potential multi-modality of mixtures with number of components n \in \ is demonstrated; in particular it is shown that the number of modes may exceed n and that the modes may not be coincident with the component means. For two components they develop a graphical tool for analysis by instead solving the aforementioned differential with respect to the first mixing weight w_1 (which also determines the second mixing weight through w_2 = 1-w_1) and expressing the solutions as a function \Pi(\alpha), \,\alpha \in ,1/math> so that the number and location of modes for a given value of w_1 corresponds to the number of intersections of the graph on the line \Pi(\alpha)=w_1. This in turn can be related to the number of oscillations of the graph and therefore to solutions of \frac = 0 leading to an explicit solution for the case of a two component mixture with \Sigma_1 = \Sigma_2 = \Sigma (sometimes called a
homoscedastic In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
mixture) given by : 1 - \alpha(1-\alpha) d_M(\mu_1, \mu_2, \Sigma)^2 where d_M(\mu_1,\mu_2,\Sigma) = \sqrt is the Mahalanobis distance between \mu_1 and \mu_2. Since the above is quadratic it follows that in this instance there are at most two modes irrespective of the dimension or the weights. For normal mixtures with general n>2 and D>1, a lower bound for the maximum number of possible modes, andconditionally on the assumption that the maximum number is finitean upper bound are known. For those combinations of n and D for which the maximum number is known, it matches the lower bound.


Examples


Two normal distributions

Simple examples can be given by a mixture of two normal distributions. (See Multimodal distribution#Mixture of two normal distributions for more details.) Given an equal (50/50) mixture of two normal distributions with the same standard deviation and different means (
homoscedastic In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
), the overall distribution will exhibit low
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurt ...
relative to a single normal distribution – the means of the subpopulations fall on the shoulders of the overall distribution. If sufficiently separated, namely by twice the (common) standard deviation, so \left, \mu_1 - \mu_2\ > 2\sigma, these form a
bimodal distribution In statistics, a multimodal distribution is a probability distribution with more than one mode. These appear as distinct peaks (local maxima) in the probability density function, as shown in Figures 1 and 2. Categorical, continuous, and d ...
, otherwise it simply has a wide peak. The variation of the overall population will also be greater than the variation of the two subpopulations (due to spread from different means), and thus exhibits
overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a ...
relative to a normal distribution with fixed variation \sigma, though it will not be overdispersed relative to a normal distribution with variation equal to variation of the overall population. Alternatively, given two subpopulations with the same mean and different standard deviations, the overall population will exhibit high kurtosis, with a sharper peak and heavier tails (and correspondingly shallower shoulders) than a single distribution. File:Bimodal.png, Univariate mixture distribution, showing bimodal distribution File:Bimodal-bivariate-small.png, Multivariate mixture distribution, showing four modes


A normal and a Cauchy distribution

The following example is adapted from Hampel, who credits John Tukey. Consider the mixture distribution defined by :. The mean of
i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
observations from behaves "normally" except for exorbitantly large samples, although the mean of does not even exist.


Applications

Mixture densities are complicated densities expressible in terms of simpler densities (the mixture components), and are used both because they provide a good model for certain data sets (where different subsets of the data exhibit different characteristics and can best be modeled separately), and because they can be more mathematically tractable, because the individual mixture components can be more easily studied than the overall mixture density. Mixture densities can be used to model a
statistical population In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypoth ...
with
subpopulation In statistics, a population is a Set (mathematics), set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way g ...
s, where the mixture components are the densities on the subpopulations, and the weights are the proportions of each subpopulation in the overall population. Mixture densities can also be used to model
experimental error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a " mistak ...
or contamination – one assumes that most of the samples measure the desired phenomenon, with some samples from a different, erroneous distribution. Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a few
outliers In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
– and instead one uses
robust statistics Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, su ...
. In
meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
of separate studies,
study heterogeneity In statistics, (between-) study heterogeneity is a phenomenon that commonly occurs when attempting to undertake a meta-analysis. In a simplistic scenario, studies whose results are to be combined in the meta-analysis would all be undertaken in the ...
causes distribution of results to be a mixture distribution, and leads to
overdispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a ...
of results relative to predicted error. For example, in a statistical survey, the
margin of error The margin of error is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a census of the e ...
(determined by sample size) predicts the
sampling error In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ( ...
and hence dispersion of results on repeated surveys. The presence of study heterogeneity (studies have different sampling bias) increases the dispersion relative to the margin of error.


See also

*
Compound distribution In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to som ...
* Contaminated normal distribution *
Convex combination In convex geometry and vector algebra, a convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other w ...
* Expectation-maximization (EM) algorithm * Not to be confused with:
list of convolutions of probability distributions In probability theory, the probability distribution of the sum of two or more independent (probability), independent random variables is the convolution of their individual distributions. The term is motivated by the fact that the probability mass ...
*
Product distribution A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables ''X'' and ''Y'', the distribution o ...


Mixture

* Mixture (probability) *
Mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observatio ...


Hierarchical models

*
Graphical model A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probabili ...
*
Hierarchical Bayes model Multilevel models (also known as hierarchical linear models, linear mixed-effect model, mixed models, nested data models, random coefficient, random-effects models, random parameter models, or split-plot designs) are statistical models of parame ...


Notes


References

* * * {{DEFAULTSORT:Mixture Density Systems of probability distributions